Probabilistic Search for Structured Data via Probabilistic Programming and Nonparametric Bayes
نویسندگان
چکیده
Databases are widespread, yet extracting relevant data can be difficult. Without substantial domain knowledge, multivariate search queries often return sparse or uninformative results. This paper introduces an approach for searching structured data based on probabilistic programming and nonparametric Bayes. Users specify queries in a probabilistic language that combines standard SQL database search operators with an information theoretic ranking function called predictive relevance. Predictive relevance can be calculated by a fast sparse matrix algorithm based on posterior samples from CrossCat, a nonparametric Bayesian model for high-dimensional, heterogeneously-typed data tables. The result is a flexible search technique that applies to a broad class of information retrieval problems, which we integrate into BayesDB, a probabilistic programming platform for probabilistic data analysis. This paper demonstrates applications to databases of US colleges, global macroeconomic indicators of public health, and classic cars. We found that human evaluators often prefer the results from probabilistic search to results from a standard baseline.
منابع مشابه
A Probabilistic Programming Approach To Probabilistic Data Analysis
Probabilistic techniques are central to data analysis, but different approaches can be challenging to apply, combine, and compare. This paper introduces composable generative population models (CGPMs), a computational abstraction that extends directed graphical models and can be used to describe and compose a broad class of probabilistic data analysis techniques. Examples include discriminative...
متن کاملHybrid Probabilistic Search Methods for Simulation Optimization
Discrete-event simulation based optimization is the process of finding the optimum design of a stochastic system when the performance measure(s) could only be estimated via simulation. Randomness in simulation outputs often challenges the correct selection of the optimum. We propose an algorithm that merges Ranking and Selection procedures with a large class of random search methods for continu...
متن کاملSupport vector regression with random output variable and probabilistic constraints
Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...
متن کاملDESIGN OF MINIMUM SEEPAGE LOSS IRRIGATION CANAL SECTIONS USING PROBABILISTIC SEARCH
To ensure efficient performance of irrigation canals, the losses from the canals need to be minimized. In this paper a modified formulation is presented to solve the optimization model for the design of different canal geometries for minimum seepage loss, in meta-heuristic environment. The complex non-linear and non-convex optimization model for canal design is solved using a probabilistic sear...
متن کاملUsing Probabilistic-Risky Programming Models in Identifying Optimized Pattern of Cultivation under Risk Conditions (Case Study: Shoshtar Region)
Using Telser and Kataoka models of probabilistic-risky mathematical programming, the present research is to determine the optimized pattern of cultivating the agricultural products of Shoshtar region under risky conditions. In order to consider the risk in the mentioned models, time period of agricultural years 1996-1997 till 2004-2005 was taken into account. Results from Telser and Kataoka mod...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.01087 شماره
صفحات -
تاریخ انتشار 2017